Conversation
Documentation preview |
bf1c6e7 to
53d0311
Compare
53d0311 to
5c7fd5a
Compare
|
The distributed embedding examples uses a custom train step functions: In my understanding, distributed embedding does NOT work with keras model.fit function: I think we need the distributed embedding team to review the PR |
FDecaYed
left a comment
There was a problem hiding this comment.
Looks good to me.
On model fit support:
Current code using model.fit should work since nothing in distributed-embedding conflicts with keras model fit/compile.
The reason we have custom train_step() example is for when user wants hybrid data/model parallel. The way horovod data parallel support model.fit() api is wrapping optimizer, which will break now if distributed model parallel is also used. By my understanding, merlin-model also have integration with horovod data parallel through distributedoptimizer? if so user could run into problem when they use both integration
We will support model.fit+hvd.distributedoptimizer in next release, so code here should just work. One caveat is the fix(and later version of DE) will be depend on a later horovod version.
Alternatively, I think merlin-model would need to implement custom train_step in block/model using DE? That'll be a much bigger change though.
|
The PR needs to be update based on the dataloader changes. There is a new version of DE. we need to add an integration test as well to be sure that the functionality is working. |
|
@FDecaYed hello. do you have any updates for this PR? thanks. |
|
@rnyak Sorry, this fell off my list. On the other hand, I'm not familiar with merlin models and the dataloader change you mentioned. @edknv do you know what it is and could you help bring the code up to date? |
Part of NVIDIA-Merlin/Merlin#733.
Goals ⚽
There is a package called distributed-embeddings, a library for building large embedding based (e.g. recommender) models in Tensorflow. It's an alternative approach to SOK.
This PR introduces
DistributedEmbeddingfor multi-GPU embedding table support.Implementation Details 🚧
distributed-embeddingsby default will round-robin the entire embedding tables across the GPUs, e.g., the first embedding table on GPU 1, the second one on GPU 2, etc.column_slicebut this has not been tested thoroughly from Models side.int_domain(similarly to the existingEmbeddingTable), determining shapes, and translating a dictionary input into an ordered list input (becausedistributed-embeddingsdoesn't support dictionaries yet).mm.Embeddingswithmm.DistributedEmbeddingsin their models when they wish to use multi-GPU embedding tables. (See the unit test for DLRM.)distributed-embeddingsis for now installed via a script that clones the github repo and installs from source, because there is no pypi package.Testing Details 🔍
Unit tests:
tests/unit/tf/horovod/test_embedding.pyPerformance tests: TBD